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Q 
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Abstract.  In  this  paper  we  develop  a  general  convergence  theory  for  a  class  of 
quasi-Newton  methods  for  equality  constrained  optimization.  The  theory  is  set  in  the 
framework  of  the  diagonalized  multiplier  method  defined  by  Tapia  and  is  an  extension  of 
the  theory  developed  by  Glad.  We  believe  that  this  framework  is  flexible  and  amenable 
to  convergence  analysis  and  generalizations.  A  key  ingredient  of  a  method  in  this  class  is 
a  multiplier  update.  Our  theory  is  tested  by  showing  that  a  straightforward  application 
gives  the  best  known  convergence  results  for  several  known  multiplier  updates.  Also  a 
characterization  of  q-superlinear  convergence  is  presented.  It  is  shown  that  in  the  special 
case  when  the  diagonalized  multiplier  method  is  equivalent  to  the  successive  quadratic 
programming  approach,  our  general  characterization  result  gives  the  Boggs,  Tolle  and 
Wang  characterization. 

1.  Introduction.  This  paper  considers  a  class  of  quasi-Newton  methods  for  solving 
the  equality  constrained  minimization  problem: 


minimize  /  (x  )  (1-1) 

subject  to  g  (x )  =  0 

where  /  :  Rn  — *•  R  and  g  :  Rn  — *•  Rm  . 

The  augmented  Lagrangian  L:Rn  xRm  XR  +  -*  R  is  given  by 

L  (x  ,X,c  )—/(*)  +  f  (*)*  X  +  J  *  (*)**(*)  . 
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For  c  equal  to  zero,  the  augmented  Lagrangian  reduces  to  the  standard  Lagrangian 

l(x  ,\)  =  f  (x)+  g{xY\  . 

If  x.  e  Rn  is  such  that  (**)  is  full  rank,  then  a  necessary  condition  for  x.  to  be  a 
solution  of  (1.1)  is  that  there  exists  X.  €  Rm  such  that  (x.,X.)  is  a  solution  of  the  non¬ 
linear  system 

V, L(x.,\.,c  )  =  0  ,  (1.2.a) 

g  (xt )  =  0  .  (1.2.b) 


Moreover,  in  this  case  X*  will  be  "Unique.  It  may  be  noted  that  the  constant  c  does  not 
affect  condition  (1.2). 

In  order  to  approximate  the  minimizer  x »  we  consider  the  Diagonalized  Multiplier 
Method  (DMM),  as  defined  by  Tapia  [39]. 

Given  x0  ,  \0  ,  B„  . 

For  k  =  0  Until  convergence  Do 


^*+i  —  U  (xk,Xk,Bk)  (1-3) 

Bkak  =  ~v,L{xk  A*+i,0  (14) 

*T+i  =  xk  +  sk  (^-^) 

+i  =  B  {xk  ,\k  ,Bk )  .  (1-6) 


The  matrices  Bk  in  (1.4)  and  Bk+i  in  (1.6)  are  intended  to  be  approximations  to  the  Hes¬ 
sian  matrix  v2L(a:.,X»,e ).  We  call  U  in  (1.3)  a  multiplier  update  and  B  in  (1.6)  an 
approximate  Hessian  update.  Implicit  in  the  formulation  of  the  DMM  is  the  option  of 
changing  U  or  B  at  each  iteration. 

On  occasions  in  the  DMM  we  will  refer  to  a  particular  choice  of  quasi-Newton 
method  for  the  steps  (1.4)-(1.6).  For  example,  diagonalize  Newton  multiplier  method 
would  mean  that  the  choice  for  Bk+l  in  (1.6)  is  (**+i>^*+i»c  )>  while  diagonalized 
secant  multiplier  method  would  emphasize  that  the  quasi-Newton  method  (1.4)-(1.6)  is 
also  a  secant  method. 

If  B  in  (1-6)  is  a  secant  update  (see  Dennis  and  Schnabel  [18]  for  details  on  these 
methods)  then  the  default  choice  for  yk  (the  structure  of  the  problem  does  not  suggest  a 
more  natural  choice)  is  the  choice  given  by  Tapia  [39];  namely 

yk  —  V*  B  (xt  +iA*  +i>c  )  -  V*  B  [xk  ,\k  +i,c  )  .  (T7) 

Recall  that  a  secant  update  requires  the  satisfaction  of  the  secant  equation 
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In  the  original  formulation  of  the  multiplier  method  (given  independently  by 
Hestenes  [28],  Powell  [33]  and  Haarhoff  and  Buys  [8])  the  multilier  X*  was  updated  only 
after  xk  had  been  found  which  minimized  L(z,X*,c)  in  x  (satisfied  V*  L  (xk  ,Xt  ,c  )=l 0). 
The  process  of  finding  such  an  z*  was  left  undefined.  Suppose  that  these  unconstrained 
minimizations  were  performed  via  a  quasi-Newton  method.  Then  in  terms  of  (1.3)-(1.6) 
the  multiplier  method  amounts  to  looping  through  (1.4)-(1.6)  an  infinite  number  of  times 
before  returning  to  the  multiplier  update  (1.3).  Of  course,  in  practice  such  an  approach 
would  be  impossible  and  only  a  finite  number  of  loops  of  (l.4)-(1.6)  (quasi-Newton  steps) 
could  be  taken  before  returning  to  the  multiplier  update  (1.3).  Tapia  [39]  formally  stated 
(1.3)-(1.6)  and  used  the  adjective  diagonalized  to  describe  this  modified  version  of  the 
multiplier  method  which  gave  the  multiplier  the  same  status  as  the  x-variable.  He  was 
motivated  by  the  feeling  that  in  an  effective  formulation  the  multiplier  X  should  be 
updated  as  often  as  the  variable  x  and  the  two  update  formulas  should  be  matched  or 
compatible  in  some  sense. 

The  following  multiplier  updates  are  well-known  and  appear  throughout  the  litera¬ 
ture: 

U  (z  ,X,J3  )  =  X+  e  g(x)  (1.9) 

U  (z,X,B)  =  -(vg*(^)V?(^))"1V!?‘(a;)v/  (*  )  (1.1°) 

U  (z  ,\,B)  =  X  +  (vff‘0O£-1v<7  (*))'V  (2)  (i  n) 

U  (z  ,X,J5  )  =  X  +  (v0,(*)-B_1V$f  (*))_1(0  (*)  -  Vf'(i)B'1V,i(*A,«  ))  (112) 

We  call  (1.9)  the  Hestenes-Powell  update  since  it  was  the  update  proposed  independently 
by  both  Hestenes  [28]  and  Powell  [33]  when  they  introduced  the  multiplier  method. 
Haarhoff  and  Buys  [8]  used  (1.10)  with  their  version  of  the  multiplier  method.  However, 
it  had  appeared  in  the  literature  numerous  times  before  it  was  used  by  them.  We  call  it 
the  projection  update  since  it  can  be  obtained  as  the  least  squares  solution  for  X  of  the 
linear  system  v*M2  ,X)— 0.  The  update  (1.11)  is  due  to  Buys,  more  will  be  said  about  it 
later  on.  Following  Tapia  [39,40]  we  refer  to  (1.2)  as  the  extended  problem,  since  it 
involves  both  z  and  X  as  unknowns.  Fletcher  [20]  calls  Newton’s  method  on  the 
extended  problem  the  Solver  method.  It  is  well-known  that  the  diagonalized  Newton  mul¬ 
tiplier  method  using  the  multiplier  update  (1.12)  is  equivalent  to  Newton’s  method  on 
the  extended  problem.  It  is  this  equivalence  which  motivated  us  to  call  the  multiplier 
update  (1.12)  the  Newton  multiplier  update.  For  a  background  on  the  multiplier  method 
and  related  issues  see  Bertsekas  [3]. 

In  the  remainder  of  this  introductory  section  we  will  accomplish  three  objectives. 
Firstly,  we  will  motivate  the  choice  of  the  DMM  as  the  framework  for  our  unified  theory. 
Secondly,  we  will  present  a  fairly  complete  historical  account  of  the  development  of  the 
convergence  theory  for  quasi-Newton  methods  for  constrained  optimization  as  it  relates 
to  the  theory  developed  in  this  paper.  This  historical  account  will  give  a  prospective  to 
our  contribution  in  terms  of  existing  results,  it  will  lend  support  to  our  reasons  for  favor¬ 
ing  the  DMM  formulation  and  finally  it  is  needed  in  its  own  right  since  the  field  has 
advanced  significantly  in  the  last  10  years  and  there  is  considerable  confusion  as  to  the 
particular  contributions  of  the  various  authors.  Thirdly,  we  will  briefly  describe  what  the 
reader  will  encounter  in  the  remaining  sections  of  this  paper. 
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We  now  motivate  our  choice  for  the  DMM  framework.  The  DMM  using  the  Newton 
multiplier  update  (1.12)  is  equivalent  to  numerous  quasi-Newton  formulations  for  prob¬ 
lem  (1.1)  (see  Tapia  [40]).  These  equivalent  formulations  include  a  structured  quasi- 
Newton  method  on  the  extended  problem,  the  popular  successive  quadratic  programming 
(SQP)  quasi-Newton  method,  and  a  formulation  which  Tapia  calls  the  structured  multi¬ 
plier  substitution  method.  The  role  of  the  multiplier  update  is  most  prominent  in  the 
DMM.  It  is  less  prominent  in  the  extended  problem  formulation,  even  less  prominent  in 
the  SQP  formulation  and  essentially  masked  in  the  structured  multiplier  substitution 
method. 

Since  the  DMM  formulation  clearly  delineates  the  role  of  the  multiplier  update, 
when  coupled  with  a  good  convergence  theory  it  should  allow  one  to  determine  exactly 
what  properties  a  multiplier  update  must  satisfy  for  a  particular  application  or  result. 
This  is  the  path  that  was  taken  by  Fontecilla  [21]  using  the  theory  developed  in  this 
paper. 

The  Newton  multiplier  update  (1.12)  is  the  only  multiplier  update  which  causes  the 
DMM  to  satisfy  linearized  constraints  (see  Theorem  10.2  of  Tapia  [40]  and  also  Fon¬ 
tecilla  [21]).  Since  the  other  formulations  described  above,  including  the  SQP  formula¬ 
tion,  satisfy  linearized  constraints  it  follows  that  the  DMM  offers  a  broader  framework 
than  do  the  other  frameworks.  Namely,  it  allows  one  to  consider  algorithms  which  do 
not  necessarily  satisfy  linearized  constraints  and  when  linearized  constraints  are  satisfied 
it  gives  an  equivalent  formulation.  We  believe  that  contemporary  approaches  designed 
with  global  behavior  in  mind  will  not  necessarily  satisfy  linearized  constraints.  This  is 
certainly  true  of  the  trust  region  algorithm  for  constrained  optimization  recently  sug¬ 
gested  by  Celis,  Dennis  and  Tapia  [ll].  Their  algorithm  can  be  described  in  the  DMM 
framework.  For  an  interesting  class  of  algorithms  which  generalize  the  DMM  see  Fon¬ 
tecilla  [22]. 

In  terms  of  popularity  there  is  no  doubt  that  the  SQP  formulation  has  won  over 
the  DMM  formulation.  However,  in  terms  of  amenability  to  convergence  analysis  and 
generalization  we  feel  that  the  DMM  may  offer  distinct  advantages. 

We  would  now  like  to  present  a  fairly  complete  historical  account  of  the  develop¬ 
ment  of  the  convergence  theory  for  quasi-Newton  methods  for  constrained  optimization 
as  it  relates  to  the  present  work.  Convergence  theory  for  algorithms  that  use  an  approxi¬ 
mation  to  the  projected  Hessian  is  not  of  the  same  flavor  as  that  presented  here  and  will 
not  be  discussed.  The  reader  interested  in  projected  Hessian  quasi-Newton  methods  is 
referred  to  the  recent  papers  by  Coleman  and  Conn  [14],  Nocedal  and  Overton  [31],  Fon¬ 
tecilla  [22]  and  Byrd  [10]. 

We  are  concerned  with  convergence  results  for  quasi-Newton  methods  for  con¬ 
strained  optimization  which  work  with  an  approximation  to  the  full  Hessian  with  respect 
to  x  .  Our  discussion  will  center  on  the  following  papers:  Buys  [7],  Garcia-Palomares  and 
Mangasarian  [23],  Han  [26],  Tapia  [39],  Powell  [34],  Byrd  [9],  Glad  [25]  and  Boggs,  Tolle 
and  Wang  [5].  While  our  choice  of  papers  is  not  exhaustive  it  is  much  more  than 
representative  and  should  give  a  good  prospective  to  the  theory  presented  in  this  paper. 

To  begin  with  the  extended  problem  is  a  part  of  the  folklore  of  constrained  optimi¬ 
zation  theory.  The  standard  convergence  theory  for  Newton’s  method  can  be  used  to 
establish  local  quadratic  convergence  in  (x  ,X)  of  Newton’s  method  applied  to  the 
extended  problem.  Furthermore,  the  standard  Broyden,  Dennis  More  [6]  convergence 
theory  for  secant  methods  can  be  used  to  establish  local  q-superlinear  convergence  in 
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(x  ,X)  of  a  standard  secant  method  applied  to  the  extended  problem  as  long  as  this  par¬ 
ticular  secant  method  does  not  require  positive  definiteness  of  the  matrix  that  is  being 
approximated,  e.g.  Broyden  or  PSB  (see  Dennis  and  Schnabel  [18]  for  more  details  on 
secant  methods).  These  facts  were  all  well-known  at  the  time  the  popular  secant  updates 
for  nonlinear  equations  and  unconstrained  optimization  were  being  developed.  However, 
it  was  generally  felt  that  the  extended  problem  approach  was  unsatisfactory.  Firstly 
because  there  was  no  underlying  unconstrained  minimization  problem  to  use  for  gui¬ 
dance  and  secondly  the  popular  DFP  and  BFGS  secant  methods  were  precluded  due  to 
the  fact  that  the  Jacobian  of  the  full  system  (Hessian  of  the  augmented  Lagrangian  with 
respect  to  both  x  and  X)  while  being  symmetric,  is  necessarily  not  positive  definite.  We 
mention  in  passing  that  it  is  ironic  that  many  authors  taking  directions  away  from  the 
extended  problem  (e.g.  SQP  or  DMM  approach)  either  openly  or  tacitly  returned  to  it 
for  their  convergence  analysis.  At  any  rate  the  stage  was  set  for  considerable  research 
activity  to  shift  to  the  multiplier  methods  as  soon  as  they  were  introduced.  After  all, 
they  did  contain  a  fundamental  unconstrained  minimization  problem. 

In  an  enlightening  thesis  Buys  [7]  showed  that  the  multiplier  method  using  the 
Hestenes-Powell  multiplier  update  (1.9)  is  the  gradient  method  with  step  length  parame¬ 
ter  c  on  the  dual  problem.  He  then  proposed  the  multiplier  update  (1.11)  for  use  with 
the  multiplier  method  since  the  resulting  algorithm  would  be  Newton’s  method  on  the 
dual  problem.  Convergence  results  followed  form  standard  theory  for  these  two  forms  of 
the  multiplier  method. 

Tapia  [39]  formally  defined  the  diagonalized  multiplier  method  (DMM)  and  demon¬ 
strated  local  q-superlinear  convergence  in  {x  ,X)  of  several  diagonalized  secant  multiplier 
methods  using  the  Newton  multiplier  update  (1.12)  including  the  DFP  and  BFGS  secant 
methods.  Various  algorithms  which  could  be  classified  as  diagonalized  multiplier  methods 
had  previously  appeared  in  the  literature.  For  example,  several  authors  including  Bard 
and  Greenstadt  [2]  and  Tapia  [37,38]  considered  algorithms  which  were  essentially  the 
diagonalized  Newton  multiplier  method  using  the  Newton  multiplier  update  (1.12).  That 
their  algorithm  was  equivalent  to  Newton’s  method  on  the  extended  problem  was  known 
to  Bard  and  Greenstadt  [2]  and  to  Tapia  [38]  but  not  in  [37].  Miele,  Cragg,  Iyer  and 
Levy  [29]  had  previously  proposed  the  diagonalized  gradient  multiplier  method  using  the 
Hestenes-Powell  multiplier  update  (1.9).  They  gave  no  convergence  analysis  but  included 
a  considerable  amount  of  numerical  experimentation. 

Byrd  [9]  considered  a  generalization  of  the  diagonalized  Newton  multiplier  method 
where  /*  Newton  steps  were  taken  on  the  unconstrained  minimization  problem  before  X* 
was  updated  to  X*+1.  He  proved,  among  other  things,  the  interesting  result  that  a  multi¬ 
plier  update,  e.g.  (1.11),  gives  local  q-quadratic  convergence  for  the  multiplier  method  if 
and  only  if  this  multiplier  update  gives  local  q-quadratic  convergence  in  (x  ,X)  in  his 
modified  form  of  the  DMM  for  any  choice  of  satisfying  jk  >2.  Namely,  two  Newton 
steps  on  the  unconstrained  minimization  subproblem  are  sufficient  to  obtain  the  optimal 
quadratic  convergence  rate  in  (x  ,X). 

Byrd’s  result  coupled  with  the  known  fact  that  the  diagonalized  Newton  multiplier 
method  using  the  Newton  multiplier  update  gave  local  q-quadratic  convergence  in  (x  ,X) 
essentially  removed  the  multiplier  method  ( f.e .  any  implementation  that  required  a  large 
number  of  quasi-Newton  steps  in  the  unconstrained  minimization  phase)  from  considera¬ 
tion  as  an  effective  algorithm  and  gave  further  impetus  to  the  DMM. 
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About  the  same  time  that  the  DMM  was  emerging  as  a  viable  formulation,  the  SQP 
approach  was  surfacing  as  an  attractive  and  viable  formulation  for  quasi-Newton 
methods  for  constrained  optimization.  Garcia-Palomares  and  Mangasarian  [23]  following 
Wilson  [41],  who  had  presented  us  with  the  SQP  Newton  method  (exact  Hessian  was 
used),  proposed  an  SQP  quasi-Newton  method  where  the  approximation  to  the  Hessian 
used  in  the  quadratic  term  was  taken  as  the  upper  left-hand  n  Xn  submatrix  of  a 
quasi-Newton  approximation  to  the  (n  +m  )x(n  +m )  Jacobian  of  the  extended  problem, 
i.e.,  the  (n  +m)x(n+m)  Hessian  with  respect  to  (ar  ,A)  of  the  (augmented)  Lagrangian. 
They  established  various  r-convergence  results  in  (ar  ,X). 

Han  [26,27]  improved  and  polished  the  Garcia-Palomares  and  Mangasarian  formula¬ 
tion  by  using  a  secant  method  to  directly  approximate  the  Hessian  with  respect  to  x  of 
the  (augmented)  Lagrangian,  i.e.,  ,the  n  Xn  submatrix  referred  above,  and  presented  us 
with  the  SQP  secant  methods  as  we  know  them  today.  He  established  local  q-superlinear 
convergence  in  ( x  ,\)  for  numerous  secant  updates  including  the  DFP  and  the  BFGS. 

Glad  [25]  independently  also  defined  the  formal  DMM.  He  established  local  conver¬ 
gence  results  for  the  diagonalized  BFGS  secant  multiplier  method  using  the  Hestenes- 
Powell  (1.9),  the  projection  (1.10)  and  the  Newton  (1.12)  multiplier  updates.  Specifically, 
he  obtained  local  q-linear  convergence  in  [x  ,X)  for  the  Hestenes-Powell  update,  local  q- 
linear  convergence  in  x  for  the  projection  update,  and  local  q-superlinear  convergence  in 
(x  ,X)  for  the  Newton  update.  While  the  results  for  the  Newton  update  had  previously 
been  obtained  by  Tapia[39],  Glad  obtained  them  independently.  To  our  knowledge  Glad 
[25]  was  the  first  to  give  any  convergence  results  for  a  secant  method  for  constrained 
optimization  which  was  not  equivalent  to  the  SQP  secant  method.  His  work  contributed 
significantly  to  our  understanding  of  the  DMM. 

All  convergence  results  mentioned  above  for  the  DFP  or  the  BFGS  secant  update 
either  carried  with  them  the  assumption  that  the  Hessian  with  respect  to  x  of  the 
Lagrangian  was  positive  definite  or  the  author  worked  with  the  augmented  Lagrangian 
and  assumed  that  c  was  sufficiently  large  so  that  the  Hessian  with  respect  to  x  of  the 
augmented  Lagrangian  was  positive  definite  near  the  solution.  Furthermore,  Han,  Tapia 
and  Glad  all  used  the  Broyden-Dennis-More  convergence  theory  for  secant  methods  and 
all  performed  their  convergence  analysis  using  a  form  of  the  extended  problem.  It  is  not 
surprising  then  that  their  results  are  essentially  the  same  and  in  particular  they  all 
obtained  q-superlinear  convergence  in  (x  ,X)  for  the  diagonalized  secant  multiplier 
methods  using  the  Newton  multiplier  update  or  the  equivalent  SQP  secant  methods. 

Tapia  [39,40]  considered  the  convergence  rate  given  by  these  algorithms  for  the  x 
variable  alone.  Clearly,  in  general  a  q-rate  in  (x  ,X)  implies  no  more  than  the  correspond¬ 
ing  r-rate  in  x  (or  in  X).  He  observed  that  if  in  the  approximation  formula  used  for  the 
Hessian  (1.6)  X  was  replaced  by  a  multiplier  estimate  which  did  not  depend  on  X,  e.g.  the 
projection  update  (1.10),  then  the  q-superlinear  convergence  rate  also  applied  to  the  vari¬ 
able  x  alone.  Glad  [25]  also  observed  that  if  the  multiplier  update  used  in  the  DMM  did 
not  depend  on  X  or  c  ,  then  the  convergence  result  could  be  stated  in  x  alone.  Powell  [34] 
obtained  an  r-superlinear  convergence  rate  for  his  modified  form  of  the  SQP  BFGS 
secant  method.  He  expressed  concern  over  the  fact  that  he  had  not  been  able  to  obtain  a 
q-superlinear  rate  in  x ;  but  he  seemed  not  to  realize  that  up  to  this  time  no  one  had 
obtained  a  q-superlinear  rate  in  x  for  the  SQP  BFGS  secant  method  or  any  similar  algo¬ 
rithm.  In  the  same  paper  Powell  derived  a  condition  which  implied  2-step  q-superlinear 
convergence  in  zfor  an  SQP  quasi-Newton  method.  This  result  fueled  the  already  burn¬ 
ing  interest  in  extending  the  well-known  Dennis-More  [17]  characterization  of  q- 
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superlinear  convergence  of  quasi-Newton  methods  for  unconstrained  optimization  to  a 
characterization  of  those  quasi-Newton  updates  which  when  used  in  the  SQP  quasi- 
Newton  method  gave  q-superlinear  convergence  in  x . 

Boggs,  Tolle  and  Wang  [5]  answered  both  of  above  open  questions.  Namely,  work¬ 
ing  with  the  DMM  and  the  Newton  multiplier  update  (1.12)  they  derived  a  characteriza¬ 
tion  of  those  approximate  Hessian  updates  which  led  to  q-superlinear  convergence  in  x . 
They  then  demonstrated  the  usefulness  of  their  characterization  by  using  it  to  prove 
that  the  DFP  and  BFGS  secant  updates  (assuming  positive  definiteness  of  the  Hessian 
with  respect  to  x  at  the  solution)  gave  q-superlinear  convergence  in  x  without  any 
modifications  as  had  previously  been  suggested  by  Tapia  [39]. 

The  work  of  Han  [27],  Tapia  [39,40],  Glad  [25]  and  Boggs,  Tolle  and  Wang  [5]  has 
greatly  influenced  the  present  work.  We  now  describe  what  the  reader  will  encounter  in 
the  material  of  the  paper  and  relate  this  material  to  the  existing  work  that  has  just  been 
described.  In  Section  2,  we  list  the  basic  assumptions  and  standard  lemmas  that  will  be 
used  in  the  remainder  of  the  paper.  We  give  a  formal  definition  of  the  notion  of  bounded 
deterioration  of  approximate  Hessian  updates  used  in  the  DMM.  We  also  describe  vari¬ 
ous  properties  which  a  multiplier  update  may  posses.  These  properties  will  allow  us  to 
determine  various  convergence  results  based  on  the  theory  developed  in  Section  3. 

In  Section  3  we  follow  the  Broyden,  Dennis  and  More  [6]  convergence  theory  and 
develop  a  unified  convergence  theory  for  the  DMM.  This  theory  requires  that  B  in  (1.6) 
be  of  bounded  deterioration  in  the  sense  defined  in  Section  2.  The  convergence  theory 
allows  us  to  determine  if  the  use  of  a  particular  multiplier  update  will  give  local  q-linear 
convergence  in  ( x  ,X)  or  the  stronger  result  of  local  q-linear  convergence  in  x  alone. 

In  Section  4  we  apply  the  tools  developed  in  Sections  2  and  3  to  the  standard  mul¬ 
tiplier  updates  (l.9)-(1.12).  We  do  this  as  a  test  and  demonstration  of  the  unified  theory. 
We  do  not  wish  to  imply  that  these  standard  multiplier  updates  should  be  used;  instead 
we  feel  that  the  understanding  gained  from  these  demonstrations  may  be  beneficial  in 
the  design  and  analysis  of  new  algorithms.  In  each  case  we  see  that  the  unified  theory 
gives  results  which  are  as  good  or  better  than  those  that  presently  exist  in  the  literature. 
The  convergence  result  for  the  Buys  (1.11)  multiplier  update  is  new  (Proposition  4.4).  It 
is  satisfying  that  our  theory  (Proposition  4.2)  for  the  Newton  multiplier  update 
(equivalently  SQP)  matches  the  Boggs,  Tolle  and  Wang  [5]  convergence  result  and  is 
superior  to  the  convergence  results  given  by  Han  [26],  Tapia  [39]  and  Glad  [25]. 

In  Section  5  we  derive  two  characterizations  of  those  update  pairs  (U,B)  where  U  is 
a  multiplier  update  and  B  is  an  approximate  Hessian  update  which  lead  to  q-superlinear 
convergence  in  x.  These  characterizations  are  Theorems  5.1  and  5.3.  We  then  show  that 
if  U  is  the  Newton  update  (1.12),  then  Corollary  5.4  gives  the  Boggs,  Tolle  and  Wang 
characterization.  However,  we  obtain  the  result  under  slightly  less  restrictive  assump¬ 
tions  than  Boggs,  Tolle  and  Wang  used.  Recently  Nocedal  and  Overton  [31]  have  also 
obtained  the  Boggs,  Tolle  and  Wang  characterization  under  these  less  restrictive  assump¬ 
tions.  We  emphasize  that  our  two  characterization  results  are  for  the  general  DMM  and 
in  the  special  case  that  the  Newton  multiplier  update  is  used  we  obtain  the  Boggs,  Tolle 
and  Wang  characterization. 
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2.  Preliminaries.  Recall  that  x.  is  a  local  solution  of  problem  (1.1)  with  associ¬ 
ated  multiplier  X..  To  simplify  our  notation  we  let 


V0*,  and  Vff(**)  =  V<?» 

(2.1. a) 

AS  =  ( x*  ,X*  ,e  ) 

(2-lb) 

A.  =  v,2/(z*,X.). 

(21c) 

Throughout  this  paper  we  will  be  making  the  following  assumptions: 

Al.  The  functions  f  and  g  have  second  derivatives  which  are  Lipschitz  continuous  in  an 
open  neighborhood  D  of  x» .  ' 

A2.  v?*  has  full  rank. 

A3,  z*  A,  z  >  0  for  all  z  =^0  satisfying  =  0. 

A4.  A,  is  nonsingular. 

Assumption  A3  is  the  well-known  second  order  sufficiency  condition  from  con¬ 
strained  optimization.  Moreover,  it  can  be  shown  (Buys  [7])  that  A2  and  A3  are 
equivalent  to  asking  that  v2/(**A«)  be  nonsingular.  Assumption  Al  and  the  nonsingu¬ 
larity  of  S72l{x*,\t)  are  the  standard  assumptions  made  when  considering  the  conver¬ 
gence  theory  for  quasi-Newton  methods  on  the  extended  problem;  and  as  such  are 
minimal  assumptions.  Many  of  the  results  that  follow  could  be  proved  without  assuming 
A4.  However,  the  generality  lost  by  assuming  A4  is  not  of  major  concern  here. 

The  following  lemmas  will  play  a  fundamental  role  in  the  analysis  presented  in  the 
remaining  sections. 

LEMMA  2.1.  There  exists  <T >0  such  that  AS  is  positive  definite  for  all  e  >c\  More¬ 
over,  for  c  >T,  letting  A  denote  AS,  Ac  denote  AS  and  v?  denote  sjg, ,  we  have 


(Ac  )-1  =  A-1  -  A'Vfl1  [(c  -  c  )-1/  +  v?‘  A-1v?]  V9*  A~l  (2. 2. a) 

and 

=  (c  -  c-j-V?' A_1v?  [(c  -  c)'1/  +  Vff'A’V?]  !  (2.2.b) 

so  that 

(A0)"1—  [/  -  A-^ffiVff1  A-'vff  )_1V?'  ]a_1  ,  (2.2.c) 

e  {Ae  )_1V?  -*•  I ,  and  (2.2.d) 

(A  e  )-1V?  ~ * -  0  as  e  —*  oo  .  (2.2.e) 


Proof.  The  first  statement  of  this  lemma  is  standard  (see  Lemma  1.25  of  Bertsekas 
[3]).  The  statements  (2.2. d)  and  (2.2. e)  were  known  to  Glad  [25]  when  he  briefly  sug¬ 
gested  a  method  of  proof.  Below  we  have  expanded  Glad’s  suggested  proof  to  the  point 
where  it  can  be  followed  with  only  a  fair  amount  of  effort. 
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Start  by  writing 

Ae  =  A  +  (c  -F)vffVf‘  •  (2-3) 

By  setting  A  equal  to  our  A  ,  {/=(e-F)vff  and  F=v?  in  equation  (13)  on  page  50  of 
Ortega  and  Rheinboldt  [32]  (Sherman-Morrison-Woodbury  formula)  we  obtain  (2.2. a). 
Multiply  (2. 2. a)  on  the  left  by  vff*  and  on  the  right  by  v?  and  take  the  first  expression 
of  the  form  V0*  A~ls?g  and  write  it  as  A~l^gB~lB  where  B—  (c -F)-1J+V0 tA~1^g  . 
Now  by  factoring  out  V0JA_1V0B_1  from  both  terms  we  arrive  at  (2.2.b).  The  expres¬ 
sions  (2.2.c)-(2.2.e)  are  direct  consequences  of  (2. 2. a)  and  (2.2. b).  Q 

In  this  paper  we  use  |  .  |  to  denote  both  the  /2  vector-norm  and  the  matrix  norm 
that  it  induces  and  we  use  |J  .  ||  to  denote  an  arbitrary  but  fixed  matrix  norm.  However, 
since  all  norms  in  a  finite  dimensional  space  are  equivalent  we  know  that  for  the  norm 
||  .  ||  there  exist  p,p> 0  such  that  for  all  A  ER "  XB  we  have 

Mil  A  ||  <  IA  |  <  „||  A  ||  .  (2.4) 

The  following  lemma  can  be  found  in  Dennis  and  Schnabel  [18]. 

LEMMA  2.2.  Assume  F  :Rn  — ►  Rn  is  differentiable  in  the  open  convex  set  D ,  and 
suppose  that  for  some  w*  in  D  and  all  wED 

\F'  (w)-F'  (»*)|  <  K  |tp  -  w*\  .  (2.5) 

for  a  positive  constant  K  .  Then  for  each  u  and  v  in  D  , 

|F  (v  )  -  F  (tt )  —F’  (w*  )(a  -  a  )|  <  K  max{|t>  -  w*  |,| v  -  w*  |}  |v  -  u  |  .  (2.6) 

Moreover,  if  F’  [w*)  is  invertible,  then  there  is  an  e>0,  a>0  and  £>0  such  that 
max{|v  -  w*  |,|u  -  tu*|}<e  implies  that  «  and  v  belong  to  D  and 

a|v  -  u  |  <  \F  (u  )  -  F{v  )|  <  f3\v  -  u  \  .  (2-7) 

The  following  lemma  was  first  stated  formally  by  Han  [26],  It  was  used  implicitly 
by  Tapia  [39,40],  Glad  [25]  and  Boggs,  Tolle  and  Wang  [5], 

LEMMA  2.3.  For  each  fixed  value  of  c  there  exist  positive  constants  K j  and  K2  and 
an  e(c  )>0  such  that  for  all  \ERm  and  for  any  u  ,v  satisfying  cr{u  ,v)<e(e  )  we  have 

|V*L(»  ,X,c  )-v*L(«  ,X,c  )-Ai(v-u  )|<  ,v  )+A2|X-X»  |  j|v-«  |  (2.8) 


where  a(u  ,v  )=max{|a  -z«|,|a  —  ar*  | } .' 

Proof.  Let  D  in  Al  play  the  role  of  D  in  Lemma  2.2  and  for  a  fixed  c  let 
V*L(.,X.,c  )  play  the  role  of  F (.).  We  know  from  assumption  Al  that  there  exists  Ai>0 
such  that 

I V,  L  (a  A*  ,e  )  -  V,  L  (a  ,X.,c  )|  <  K^v  -  a  |  (2.9) 

for  all  a  ,v  ED  .  A  straightforward  calculation  gives 
V*  L  (a  ,X,c  )  -  v*  L  (a  ,X,c  )  -  A/(a  -  a  ) 
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=  v,  L  (t>  ,X.  ,c  )  -  v,  L  («  ,X*  )  -  AS{v  -  u  ) 

+  [vy  (w )  -  Vff  (w  )  ](X -X.)  .  (2.10) 

Now  using  the  triangle  inequality  on  (2.10)  and  both  (2.6)  and  (2.7)  of  Lemma  2.2  we 
obtain  (2.8)  with  K2  given  by  fj  in  (2.7)  and  c(c  )  given  by  Lemma  2.2.  Q 

LEMMA  2.4.  There  exist  positive  constants  Ks  and  such  that  for  all  vERm 

K*\v\  >  \Vg.v\  >  K4\v\  .  (2.11) 

Proof.  Since  v?*  is  full  rank  xjgl  v?»  is  nonsingular  and 

v  )_1V?»(V?» »  )  • 


The  result  now  follows  by  choosing 

K,~  lv».|  «»<<  K,  —  |(v«.,V».)-'v»*i‘1  • 

□ 

The  following  notion  plays  a  fundamental  role  in  the  convergence  theory  developed 
in  the  next  section.  It  is  an  extension  to  constrained  optimization  of  the  notion  of 
bounded  deterioration  originated  by  Dennis  [16]  and  used  extensively  by  Broyden, 
Dennis  and  More  [6]. 

Consider  the  pair  (U  ,B  )  where  U  is  a  multiplier  update  and  B  is  an  approximate 
Hessian  update  (see  (1.3)  and  (1.6)).  Also  consider  c  >0  such  that  AS  is  nonsingular.  Sup¬ 
pose  that  U  and  B  are  defined  in  a  neighborhood  N  —NiXN2XNs  of  {x.,\»,AS)  where 
N3  contains  only  nonsingular  matrices. 

DEFINITION  2.5.  The  update  B  is  said  to  be  of  bounded  deterioration  (at  AS  with 
respect  to  V )  if  there  exist  non-negative  constants  <*j  and  a2  such  that  for  each 
(x  ,\,B)EN  and  for 


\+  =  U  (x,X,£) 

(2. 12. a) 

x  +  =  x  -  B~l v*  L  (*  ,X+>C  ) 

(2.12.b) 

B  +  =  B  (x  ,\,B  ) 

(2.12.c) 

we  have 

II  B+-AS  ||<  l+a^z  ,z+ )  j||  B -AS  W+a^z  ,z+) 

(2.13) 

with 

2  =  (x  ,X)  ,  2+  =  (*+,X+) 

(2. 14. a) 

and 

cr{z  ,z+)  =  max{|x-x.  |,|x+-x.  |,|X-X.  |,|X+-X.  |)  . 

(2.14.b) 

Moreover,  we  say  that  the  multiplier  update  U  is  x-dominated  (at  AS)  if  there  exists  a 
non-negative  constant  <£(e  )<1  such  that  for  each  (x  ,\,B)eN  we  have 
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|(a;)-V?»(^+  -  Ml  <  ^(Ok  ~x‘  \  ■  (2-15) 

Furthermore,  we  say  that  the  multiplier  update  U  is  weakly  x-dominated  (at  AS  with 
respect  toB  )  if 

|(v4/)-‘v?*  |  <  1  (2. 16. a) 

and  there  exists  a  non-negative  constant  <j>(c  )<1  such  that  for  each  (a;  ,\,B  )&N  we  have 

|U  (x  +,\+,B  +)  -  X.  |  <  4>{c  )max(|X+  -  X.  |,|a:  -  x.  |)  .  (2.16.b) 

Finally,  we  say  that  the  multiplier  update  U  is  consistent  if  it  is  continuous  in  N  and  for 
all  B  ENZ  we  have 

X.  =U  (x,,\.,B).  (2.17) 

Observe  that  (2.2. e)  of  Lemma  2.1  says  that  (2. 16. a)  will  be  satisfied  for  c  sufficiently 
large. 


3.  Local  convergence.  This  section  is  devoted  to  the  study  of  the  convergence  of 
the  sequences  generated  by  the  DMM.  Recall  assumptions  A1  and  A2  of  Section  2  and 
Definition  2.5.  The  proofs  of  the  following  two  theorems  will  follow  Broyden,  Dennis  and 
More  [6]  as  closely  as  possible. 

THEOREM  3.1.  Consider  the  update  pair  (U  ,B  ).  Suppose  that  U  is  x-dominated  at 
AS  with  constant  <t>=<j>{e  )  and  B  is  of  bounded  deterioration  at  AS  with  respect  to  U  . 
Then  for  each  r  6(^,1),  there  exist  positive  ,  «x  and  6  such  that  for 

\x0  -*•!<€,,  |x4  -  X-  I  <  ex,  and  [|  Be  -  AS  ||  <  6 

the  sequence  {(x*,X*)}  generated  by  the  DFIM  ( l.S)-(1.6 )  is  well  defined  and  converges  to 
(x»,X»).  Furthermore,  for  all  k  >0  we  have 

K+i-*»l  <  r  |z*  -x.\,  (3.1) 

and  {Bk  }  and  {JB*-1}  are  bounded. 

Proof  Choose  positive  e,  and  6  so  that  |  ar-x»|<e,  ,  |X-X.|<ex  and  ||  B -AS  ||<25 
imply  that  (x,\,B)  is  contained  in  the  neighborhoods  qualifying  U  to  be  x-dominated 
and  B  to  be  of  bounded  deterioration.  Further,  restrict  e,  so  that  e,  <e(« )  where  e(c  )  is 
given  by  Lemma  2.3.  Let  Kx  and  K2  also  be  given  by  Lemma  2.3,  let  Kk  be  given  by 
Lemma  2.4  and  let  and  a2  be  as  in  (2.13).  For  the  norm  j|  .  ||,  let  *7  be  given  by  (2.4). 
Choose  'ri>|(A.<)-1|  and  i2>\AS\. 

Further  restrict  ex  ,ex  and  S  so  that 

(2 aj  +  a2)^~  <  6,  (3.2) 
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7i(l+r)  [{Kr+K^K?  +  (l+^rjs]  <  r-<f>, 

(3.3) 

and 

€X  <  min(e,  ,72 Kfl  <f>ex )  . 

(3-4) 

Now  suppose  that  \xa-xt\<cx ,  |X<,-X*j<ex  and  ||  B0-Ai  ||<6.  Then  \B0 
since  (3.3)  implies  that 

27^1-t-r  )r)6  <  r  , 

-A>\<t]6<2t)S  and 

(3.5) 

the  Banach  Perturbation  Lemma  [32]  gives 

'  |£e-1|<(l+rbi. 

(3.6) 

A  straightforward  argument  gives 

|*i  -  x.\<\B0-l\\\7xL{xt  ,Xj,c  )-v,L{x.  ,\ue  )-Ai(x0  -x.  )| 

+  \B-x\\B0-AS\\x0-x.\  +  \B-'Ai\  |  (A/)-V?.(Xi-X.)|  • 

(3.7) 

The  triangle  inequality  gives 

\B~lAi\  -  1  <  |  /  -  B.-XAS  |  <  |  B,~l\\B.  - AS\ ; 

(3.8) 

so  it  follows  that  -  * 

\B~XAS\  <  1  +  (l+rb,2r?5  . 

(3.9) 

Also, 

|A/|-^4|Xi-x.|  <  KA.TVff^Xi-xoi; 

(3.10) 

so  it  follows  that 

|Xi  -  X.  |  <  72 A4-1  <t>\x„  -  x,  |  . 

(3.11) 

Now  using  Lemma  2.3  with  u  =x0  and  v  =x « ,  (3.9)  and  (3.11)  we  obtain  from  (3.7) 

|®i-x#|<7i(H-r)  \[Ki+Ktf2Kil  <j>)ex  +  (l+0)2r?6  ]\x0-x*  |  +  <f>\x0  -  x.  |  .  (3.12) 

Using  the  bound  (3.3)  we  see  that  (3.12)  leads  to  \xx-x.  |<r  lx,  -  x.  |.  It  follows  from  (3.4) 
and  (3.11)  that  IXj— X » |  <ex- 

We  complete  the  proof  with  an  induction  argument.  Assume  that  ||  Bk  -At  | j <25, 
|<r  \xk  -x.  |  and  |Xt+1-X»|<ex  for  *=l,...,m-l.  Observing  that  (3.4)  implies  in 
(2.14.b)  that  <?(z  ,z+)<(x ,  we  obtain  from  (2.13)  that 

||  Bk+X-Ai  1|-||  Bk-AS  ||<2 r *  +a2e,  r*  .  (3.13) 
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By  summing  both  sides  of  (3.13)  from  k  =  0  to  k—m- 1  we  obtain 

II  Bm  -  AS  ||  <  ||  B.  -  AS  ||  +  (2 ayS  +  «a)^y;  (3-14) 

which  by  (3.2)  implies  that  ||  Bm-AS  ||<26.  To  complete  the  induction  we  need  to  show 
that  \xm  +i~x,  |<r  \xm  -x*  |  and  |XTO+1-X»  |<«x.  These  inequalities  are  established  in  exactly 
the  way  we  established  them  for  m  =0.  The  boundedness  of  {£*  }  and  {Bp1}  follows 
directly  from  the  inequalities  established  above.  j~~| 

THEOREM  3.2.  Consider  the  update  pair  (U  ,B  ).  Suppose  that  U  is  consistent  and 
weakly  x-dominated  at  AS  with  respect  to  B  with  constant  <j>(c )  and  IS  is  of  bounded 
deterioration  at  AS  with  respect  to  U.  Let  <f>—max(d>(c  ),|(A')'1V!7«  I)-  Then  for  each 
r  €(<M)  there  exist  positive  e  and  6  such  that  for 

\x0  -  X.  I  <  e,  |X„  -  X*  I  <  e,  and  ||  Bc  -  AS  ||  <  6 


the  sequence  {(a:*  ,X* )}  generated  by  the  DMM  (l.S)-(l.6)  is  well-defined  and  converges 
to  (x»,X,).  Furthermore  for  all  k  >0  we  have 

max(|z*+1 -*.|,|Xt+2-X,|)  <  r  max(|it  -  x.  |,|X*+1  -  X.  |)  (3.15) 

and  {Bk  }  and  {R*-1}  are  bounded. 

Proof.  Choose  positive  e'  and  so  that  |  x-x,  |<e'  ,  |X-X,|<e'  and  ||  B -AS  ||<2^ 
imply  that  (x,\,B)  is  contained  in  the  neighborhoods  qualifying  U  to  be  weakly  x- 
dominated  and  B  to  be -of  bounded  deterioration.  Further  restrict  e  so  that  e  <e(c ) 
where  e(e )  is  given  by  Lemma  2.3.  Let  Kx  and  K2  also  be  given  by  Lemma  2.3  and  let  ax 
and  a2  be  as  in  (2.13).  For  the  norm  |j  .  ||,  let  fj  be  given  by  (2.4).  Choose  7>|(^*T1I- 

Relying  on  the  consistency  of  U  choose  e<c  and  6<f  so  that 

|U  {x,\,B)~  X.|  <  e  (3.16) 

whenever  |  z-x«|<€,  |X-X.J<e  and  \B-AS\<6.  Further  restrict  e  and  6  so  that 

(2a15  +  a2)-^|7y  <6,  (3.17) 

and 

T^l+r  )  i+7C2)€  +  2r)6(l+<t>)  j  <  r  —  .  (3.18) 

Now  suppose  that  |x-x#|<e,  |X-X*|<€  and  \\  B -AS  |j<5.  Then  \B0 -AS\<rj6<2r]6  and 
since  (3.17)  implies  that 

2-7(1+ r  )p6  <  r,  (3.19) 

the  Banach  Perturbation  Lemma  [32]  gives 

\B0-'\  <  (l+r)7.  (3.20) 

A  straightforward  argument  using  (3.7)-(3.9),  Lemma  2.3  with  u  —x„  and  v  =x#  and 
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(3.18)  gives 

\xx  -  x,\  <  -7(l+r)  ^ K  j€  +  2f}8  J|x0  -  x.\ 

4-  Tf(l+r  —  X»  |  +  ^1  4-  (1+r  )l%rib  j^|Xi  —  X»  | 

<^T(l-f-r  )  ^(/sTi+i;£r2)£"*'2fj6(l+^)  j+^^max(|ze  -ar»  |,|Xi— X*  |) 

<  r  inaxdio  —x>  | , |X i — X » |)  .  (3-21) 

/ 

From  the  definition  of  weakly  x-dominated  we  have 

|X2-X»|  <  <j)  max^x,,  -z»  [JXi-X*  |)  .  (3.22) 

Finally,  (3.21)  and  (3.22)  show  that  (3.15)  and 

I1*  +i  -  x,  |  <  e  and  |Xi+2-X*|<€’  (3.23) 

hold  for  k  —  0.  That  (3.15)  and  (3.23)  hold  for  arbitrary  k  >0  can  be  established  by 
induction  in  a  similar  manner.  Q 

Notice  that  an  x-dominated  multiplier  update  leads  to  the  q-linear  convergence  of 
{(xt,Xfc)}  and  {**  }  but  , not  {X*};  while  a  weakly  x-dominated  update  leads  to  the  q- 
linear  convergence  of  {(**  ,X*+1)}  but  not  {xk  }  or  {X*  }.  Of  course  in  both  cases  we  will 
have  r-linear  convergence  of  {xk  }  and  (X*  }. 


4.  A  Demonstration  of  the  Convergence  Theory.  In  this  section  we  apply  the 
convergence  theory  developed  in  Section  3  to  the  four  standard  multiplier  updates  given 
by  (1.9)-(1.12).  As  stated  in  the  introduction  we  do  this  more  as  a  test  and  demonstra¬ 
tion  of  the  unified  theory  than  as  a  statement  about  the  updates  themselves. 

.  In  what  follows  we  will  be  requiring  B  to  be  of  bounded  deterioration  at  AS  with 
respect  to  the  particular  multiplier  update  in  question.  Implicit  in  the  work  of  Han  [26], 
Tapia  [39],  Glad  [25]  and  Boggs,  Tolle  and  Wang  [5]  is  a  proof  that  the  Broyden  and 
PSB  secant  updates  and  the  DFP  and  the  BFGS  secant  updates  in  the  case  of  positive 
definite  AS  are  of  bounded  deterioration  for  any  particular  update.  See  in  particular  the 
comments  in  the  proof  of  our  Corollary  5.5. 

PROPOSITION  4.1.  Given  r€(0,l)  there  exists  e(r)>0  such  that  for  each  c  >c(r) 
u>e  can  find  a  neighborhood  of  (x.  ,X»  ,AS)  so  that  in  this  neighborhood  the  projection  mul¬ 
tiplier  update  (1.10)  is  x-dominated  with  constant  <j>{c)<r .  Hence  the  DMM  using  this 
multiplier  update  and  c  >c  (r )  is  locally  convergent  and  satisfies  (8.1). 

Proof.  We  will  give  the  proof  for  any  consistent  update  which  does  not  depend  on 
c  or  X  and  has  continuous  partial  derivative  with  respect  to  x ,  since  it  is  essentially  the 
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same  as  the  proof  for  the  special  case  (1.10). 

Choose  c(r)>F  in  Lemma  2.1  so  that  AS  is  invertible.  By  consistency  and  the 
mean-value  theorem  we  have 

p.TVff*  (x+  -  X.  )|  =  \(AS)-lVg.  [u  (z  ,B )  -  U  (z.  ,B  )  ]| 

<  KA.TVtf.  II V.  U  (x.  +0(z  - X ,  ),B  )||z  -  z.  I  (4.1) 

for  some  0€(O,1).  The  proof  now  follows  from  (2.2.e)  of  Lemma  2.1  and  the  continuity  of 
V,U  (x,B).  □ 

A  proposition  exactly  like  Proposition  4.1  can  be  proved  for  the  Newton  multiplier 
update  (1.12).  However,  the  Newton  update  does  not  require  large  c  .  In  fact  we  have 
arbitrary  good  local  linear  convergence  with  c  =0.  Recall  that  A.  denotes  y  2/ (z.  ,X,). 

PROPOSITION  4.2.  Given  r  6(0,1}  there  exists  a  neighborhood  of  (x,  ,X»,A«)  such 
that  the  Newton  multiplier  update  (1.12)  is  x-dominated  with  constant  <j><r .  Hence  the 
D MM  using  this  multiplier  update  and  c  =0  is  locally  convergent  and  satisfies  (8.1). 

Proof.  A  straightforward  calculation  shows  that  U  (z  ,\,B)  as  given  by  (1.12)  is  con¬ 
sistent  and  independent  of  X.  So  we  can  write 

X+-X»  =  U  (z  ,\,B  )  -  X#  =  U  (z  ,X » ,5 )  -  U  (z.  ,B)  .  (4.2) 

Differentiating  U  (z  ,B )  with  respect  to  z  at  z  —x,  gives 

V,u  (x.,\.,B)  —  {\7giB-i'Vg*Yl'7gi  [/  -  B~XA.  j  .  (4.3) 

From  (4.3)  we  see  that  y,  U  (z«  ,X.  ,A.  )=0.  Using  the  mean-value  theorem,  (4.2)  and 
(4.3)  we  obtain 

|X+-X.|  <  |y,  U  (x*+9(x  -xt),\*  ,S)||z  -  x.  \  (4.4) 

for  some  06(0,1).  The  proposition  now  follows  by  observing  that  by  continuity  the 
derivative  term  in  (4.4)  can  be  made  arbitrarily  small  for  z  near  x.  and  B  near  A..  Q 

PROPOSITION  4.3.  Given  r€(0,l)  there  exists  e(r)>0  such  that  for  each  c  >e(r) 
we  can  find  a  neighborhood  of  (x.  ,X.  ,AS)  so  that  in  this  neighborhood  the  Hestenes-Powell 
multiplier  update  (1.9)  is  weakly  x-dominated  with  constant  4>{c)<r  .  Hence  the  DMM 
using  this  multiplier  update  and  c  >c  (r )  is  locally  convergent  and  satisfies  (8.15). 

Proof.  Consider  U  given  by  (1.9).  Write  X++=U  (z+,X+,£+).  For  *'=l,...,m  there 
exist  Of  6(0,1)  such  that 

(*  +)  -  ?<  (2* )  =  (*•  +0,-  (*  +~X‘  ))*  (z  +  -  x. ) . 


Let 


,  vgm  (*»  +0, 


so  that  j(z+)-p(z.)=yj}  (z+-z»).  We  can  now  write,  since  g{x,)= 0, 
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X++-X»  =X+-X*  +c  [x  +)-g  (x» )  J=X+-X»  +e  V0#  (z+-**)  •  (4.5) 

A  straightforward  calculation  shows  that 

x+  -  Xt  =  £/  -  B~lAS  j(x  -  x. )  -  B~l\?g  [x  )(X+  -  X* ) 

-B~l  \s?,L{x,\.,c)-\7,L{x.,\.,c)-AS{x  -*,)]  (4-6) 

Now  using  Lemma  2.2  with  F(x  )=v*  L  ix  .X*  ,c  ),  u  =x»  and  v  —x  ,  and  combining  (4.5) 
with  (4.6)  we  obtain  KL  such  that 

IVh-M  <  \I  -«  Vff5S-1Vs(z)||X+-X.| 

+  e  \V9i  I  [|/  -  B~'Ai\  +  Kl  | B-'\\x  -  x.  |  ]|*  -  x.  |  .  (4.7) 

The  proposition  now  follows  from  (4.7)  by  first  choosing  e(r)  large  guided  by  (2.2.d)  of 
Lemma  2.1.  Then  choosing  {x  ,B)  close  to  (x.  ^Ai)  and  if  needed  further  restricting  the 
choice  of  x  .  Q 

PROPOSITION  4.4.  Given  r€(0,l)  there  exists  e(r)>0  such  that  for  each  c  >c(r) 
we  can  find  a  neighborhood  of  (x.,\.,Ai)  so  that  in  this  neighborhood  the  Buys  multiplier 
update  (1.11)  is  weakly  x-dominated  with  constant  ^(c)<r  .  Hence  the  DMM  using  this 
multiplier  update  and  c  >c  (r)  is  locally  convergent  and  satisfies  (3.15). 

Proof.  Exactly  the  same  argument  used  in  the  proof  of  Proposition  4.3  will  lead  us 
to  an  expression  of  the  form  (4.7)  with  the  factor  c  replaced  by  (vff  (*+)'  (*  +))'*• 

The  proof  now  follows  by  choosing  c(r)  large  enough  so  that  |(A/)_1vj»  |<r  and  then 
choosing  (x,B)  sufficiently  close  to  (x» ,A.e)  so  that  the  appropriate  factors  will  lead  to  a 
<f>(c  )  which  is  less  than  1.  Q 

The  proofs  given  above  suggest  that  for  the  Hestenes-Powell  update  c  must  be 
very  large,  for  the  projection  update  and  the  Buys  update  c  should  be  of  the  same  order 
and  need  not  be  particularly  large  and  for  the  Newton  update  c  =0  works  fine. 


5.  Superlinear  convergence.  In  this  section  we  will  develop  a  theory  for  studying 
the  q-superlinear  convergence  of  the  sequence  {x*  }  generated  by  the  DMM.  This  theory 
is  closely  related  to  the  theory  for  unconstrained  optimization.  Recall  that  the  sequence 
(x*  }  is  q-superlinearly  convergent  to  x.  if 


lim 

k-.cc 


1*»+1  ~  x  *  I 

I  **  -  X.  | 


=  0  . 


(5.1) 


According  to  the  Dennis-More  [17]  characterization  theory  for  quasi-Newton  methods  for 
unconstrained  optimization  in  the  case  of  a  quasi-Newton  method  applied  to  the  uncon¬ 
strained  minimization  of  the  functional  L(x,X.,e)  (equivalently  the  idealized  DMM 
where  the  choice  for  X*  is  X.),  a  necessary  and  sufficient  condition  for  q-superlinear 
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convergence  is 


lim 

k  -*  oo 


1(5*  -  AS)ek  | 

M 


(5.2) 


assuming  convergence  of  the  iterates. 

It  seems  reasonable  that  (5.2)  will  play  an  important  role  in  our  characterization  of 
q-superlinear  convergence  in  constrained  optimization  and  that  by  itself  will  not  imply 
q-superlinear  convergence.  In  fact,  it  is  not  surprising  that  the  additional  condition  that 
is  needed  is  that  the  multipliers  X*  converge  to  X*  sufficiently  fast,  i.e. 


lim 

k  -*•  oo 


l\t+i -  I 

kl 


(5.3) 


Recall  that  the  sequences  {x*  }  and  {X*}  are  generated  by  the  DMM  (1.3)-(1.6).  In  addi¬ 
tion  to  assumptions  A1-A4  we  will  assume 

A5.  The  iterates  xk  6  D  ,  and  lim  xk  =  x. . 

Jr  — ♦  oo 

The  following  is  our  first  characterization  of  q-superlinear  convergence  of  the  sequence 
{**}• 

THEOREM  5.1.  Any  two  of  (5.1),  (5.2),  or  (5.8)  imply  the  third. 

Proof.  From  (1.4)  we  can  write 

—  ^7*  5  {xk  +1,X*  ,c  )  =  —  [^7,  L  ( xk  +i,X*  ,c  )  —  v*  5  (x*  ,X *  ,c  )  —  Ai  «*  ] 


+  Vff*  (X*+i  -  )  +  (R*  -  Ai)  sk 


(5.4) 


Let  F(x)  —  v*5(x  ,X»,c ).  Then  F(x«)  =  0,  and  F'  ( x)  —  Ai .  From  Lemma  2.2  there 
exist  positive  5  and  a  such  that 

ft  I2*  +i  —  x*  I  ^  |v*  5  (z*  +i A*  »c  )|  ^  a  I21*  +i  -  |  (55) 


for  k  sufficiently  large.  Also  from  Lemma  2.3  there  exist  a  positive  constant  Kx  such 
that 

Ii'7i5(^*+iA*ic)  —  ViL(x*,X»,c)  —  At  5*  |  ^  K lcr(xk  ,x* +i)|fl*  |  .  (5.6) 


Now  by  dividing  (5.4)  by  js*  |  and  observing  (5.6)  we  see  that  any  two  of  the  following 
three  statements  implies  the  third 


..  I V*  5  (x*  +i,X #  ,c  )| 
lim  - 1 — j - =  0 

*  oo  |«*  I 

Urn  iYftiWMI  _Q 

*  -*  oo  |e*  I 


(5.7) 

(5.8) 
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lim 

£—♦00 


\(Bk  -  Ai)ek  | 

kl 


=  o . 


Inequality  (5.5)  shows  that  (5.7)  is  equivalent  to 


lim 

k  oo 


I Xk  +1  -  X.  | 


kl 


(5.9) 


(5.10) 


A  fairly  straightforward  argument  using  the  triangle  inequality  can  be  used  to  show  that 
(5.1)  is  equivalent  to  (5.10).  Lemma  2.4  shows  that  (5.8)  is  equivalent  to  (5.3).  Finally, 
(5.9)  is  exactly  (5.2).  Q 

Our  second  characterization  'of  q-superlinear  convergence  of  the  sequence  {x* }  will 
use  the  projection  operator  onto  the  tangent  space  of  the  constraints,  i.e. 

P{x)  =  I  -  VP  (2)  [vp'OOvp  (*)]  VP*(*)- 

Let  P.  —  P(x.),  and  Pk  =  P{xk).  Before  stating  the  next  theorem  we  need  a  technical 
result. 

LEMMA  5.2.  Let  Hc  :  R  "  — *•  R  n  be  a  function  defined  by 

He{x)  =  P(x)v,l(x  +  c  VP»p(*) 


for  x  e  D  .  Then  He  is  continuously  differentiable  in  D  , 

He{x.)  =  0,  (5.11) 

and 

He'  (x»)  =  P>  At  +  c  VP*  V?»  •  (5.12) 

Moreover,  if  c  7^  0  then  He  (x«)  is  nonsingular. 

Proof  From  assumptions  A1-A4  we  have  that  He  is  continuously  differentiable  in 
D  ,  and  (5.11)  holds.  By  differentiating  He  and  evaluating  Hc'  at  x#  we  get  (5.12).  The 
nonsingularity  of  Hc  (x»)  is  due  to  the  following  fact.  Let  d  ^  0  and 

'  ( P »  A.  +  c  VP*  VP*)  d  —0  . 


Since  VP*  has  full  column  rank  we  have 

P>  At  d  —  0  and  d  =  0  . 


Therefore,  P.  d  =  d  and  d*  A,  d  =0,  which  is  a  contradiction  since  dx  At  d  >0  for  all 
d  such  that  VP*  d  =  0.  Q 

THEOREM  5.3.  A  necessary  and  sufficient  condition  for  (x* }  to  converge  q- 
superlinearly  to  x*  is 
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and 


,.  \P„(Bk  -A.)»k\  n 

hm  - j — j - =  0, 

*  -* 00  \sk\ 


1 9k  +  V0*«*  | 
hm  - ; — j - -  0 

*  00  |«*  I 


Proof.  From  (1.4)  we  have 

Bk  sk  +  Vx  L(xk  ,\k+i,e)  =  0 


hence 


-  V  x  L  (xk  ,\k +1  ,c)  —  Bk  8k  +  Vff*  (^t+i  -  X»). 


(5.13) 


(5.14) 


Multiplying  our  last  expression  by  Pk  and  observing  that  Pk  \?gk  =0  we  obtain 
Pk  Bk  8k  +  Pk  V*  1  (**  ,X.)  ==  0. 

Adding  on  both  sides  -  He(xk+1)  yields 

-  He  (**  +i)  =  Pk  Bk  sk  -  [ Hc  ( xk  +1)  -  He  {xk )]  -  c  v?«  9k  ■ 

Using  (5.12)  we  obtain 

-  He  (x*  +1)  =  -  [He  (x*  +1)  -  He  ( xk  )  -  Hc'  (x.  )s*  ] 

. -  e  +  V !?•«*] 

+  (Pk  -  P,)A,ek 

+  Pk{Bk  -  A,  )«*  . 


(5.15) 


From  Lemma  2.1,  and  Lemma  5.2  there  exist  for  c  =4  0,  f3  >  a  >  0  such  that 

P  |xfc+1  -  X.  I  >  I Hc  (x*+1)|  >  q;  |i*+i  -  X.  I  (5.16) 

for  k  sufficiently  large.  Also  from  Lemma  2.3  there  exists  a  positive  Kx  such  that 

I He  (x*+1)  -  Hc  (xk  )  -  Hf  (x.K  I  <  KAxk  ,**+,)!*  I  •  (5.17) 


Consider  the  condition 


\HC  (xk  +l)|  „ 

hm  - ; — j - =  0 

*  -  oo  I 


(5.18) 


By  (5.16)  we  see  that  (5.18)  is  equivalent  to 


hm  - f — j - =  0 

*  oo  |«*  I 


(5.19) 
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which  (as  argued  in  the  proof  of  Theorem  5.1)  in  turn  is  equivalent  to  (5.1)  and  q- 
superlinear  convergence. 

Assume  (5.13)  and  (5.14).  Divide  (5.15)  by  |,  recall  (5.17)  and  take  limits  to 
obtain  (5.18).  This  establishes  the  q-  superlinear  convergence  of  (x*  }  to  x. . 

Let  us  now  assume  (5.1).  We  first  argue  that  this  assumption  implies  (5.14).  To  see 
this  we  write 

9k  +  V0*h  —  -  (ff*+ 1  -  9k  ~  V Qk&k  )  +  (?*+i  -  9k )  *  (5.20) 


Dividing  (5.20)  by  |a*|,  calling  on  Lemma  2.2  to  bound  the  first  term  on  the  right-hand 
side  of  (5.20)  and  the  mean-value  theorem  to  bound  the  second  term  we  have 


lim 

k  -*  oo 


I?*  +  V9k*k  I 

hi 


<  K 


lim 

k  —*  OO 


l**+i  -  x‘\ 


hi 


(5.21) 


for  some  positive  constant  K .  If  now  recall  the  fact  that  (5.1)  is  equivalent  to  (5.19)  we 
see  that  (5.21)  implies  (5.14). 

We  are  assuming  (5.1)  holds  or  equivalently  (5.18).  We  have  established  (5.14).  This 
means  that  if  we  divide  (5.15)  by  |«*  |  and  recall  (5.14),  (5.17)  and  (5.18)  when  taking  lim¬ 
its  we  obtain  (5.13).  Q 

It  is  of  considerable  interest  to  see  what  Theorem  5.3  gives  when  the  multiplier 
update  is  the  Newton  update  (1.12).  Recall  that  in  this  case  the  DMM  is  equivalent  to 
Successive  Quadratic  Programming.  Theorem  10.2  of  Tapia  [40]  says  that  the  DMM 
using  the  Newton  multiplier  update  satisfies  linearized  constraints,  i.e.,  (5.14)  holds.  This 
gives  us  the  following  corollary  to  Theorem  5.3. 

COROLLARY  5.4.  Let  the  sequences  {z*  }  be  generated  using  the  DMM  (l.S)-(1.6) 
with  the  Newton  multiplier  update  formula  (1.12).  Then  a  necessary  and  sufficient  condi¬ 
tion  for  {x*  }  to  converge  q-superlinearly  to  x«  is 


lim 

k  —*  oo 


I Pk{Bt  -  A»)«*  | 

hi 


=  o 


(5.22) 


Corollary  5.4  is  the  Boggs,  Tolle  and  Wang  [5]  characterization  theorem  discussed 
in  Section  1. 

The  following  Corollary  says  that  the  most  popular  secant  updates,  Broyden,  PSB, 
DFP  and  BFGS,  which  are  known  to  give  local  q-superlinear  convergence  in  the  case  of 
unconstrained  optimization  give  local  q-superlinear  convergence  in  x  in  the  case  of  con¬ 
strained  optimization  provided  one  uses  the  Newton  multiplier  update. 

By  using  an  obvious  weighting  and  the  infinity  norm  it  is  not  difficult  to  see  that  if 
one  obtains  q-superlinear  convergence  in  x  using  an  x-dominated  multiplier  update,  then 
q-superlinear  convergence  in  the  pair  (x  ,X)  follows.  The  converse  is  not  necessarily  true. 

The  fact  that  the  DMM  using  the  Newton  update,  equivalently  SQP,  gives  q- 
superlinear  convergence  in  the  pair  (x  ,X)  was  established  by  Han  [26],  Tapia  [39]  and 
Glad  [25].  That  one  also  obtains  q-superlinear  convergence  in  x  alone  was  established  by 
Boggs,  Tolle  and  Wang  [5], 

COROLLARY  5.5.  Consider  the  DMM  using  the  updates  U  and  B  where  U  is  x- 
dominated  and  B  is  either  the  PSB,  the  DFP  or  the  BFGS  secant  update.  Assume  that  in 
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the  case  of  DFP  and  BFGS  secant  updates  the  matrix  AS  is  positive  definite.  Then  the 
DMM  is  locally  q- linearly  convergent  in  x.  Moreover,  if  U  is  the  Newton  multiplier 
update  then  we  also  have  q-superlinear  convergence  in  x. 

Proof.  Implicit  in  the  works  of  Han  [26],  Glad  [25],  Tapia  [39]  and  Boggs,  Tolle  and 
Wang  [5]  for  the  Broyden,  PSB  and  DFP  secant  updates  is  an  inequality  of  the  form 

I  \Bk+l-AS\  j  <  [l+a^**  ,**+,)]  |  I  Bk  -  AS  \  \ 

+  /V(**  ,xk+i)  +  02\ ^k+i  ~  M  (5.23) 


where  ai>0,  ^>0  and  #2>0  and  as  before 

<*{xk  ,**+i)  =  max{|**+i  -  x»  j,|z*  -  x.  |)  . 

Now  using  the  assumption  that  the  multiplier  update  U  is  x-dominated  we  can  write 
(5.23)  as 

I  I  Bk+l  -  AS  |  |  <  [l+a^z*  ,xk+l)  ]  |  \Bk  -  AS  \  \ 


,xk. 


■l)  • 


(5.24) 


From  Proposition  4.2  we  have  local  q-linear  convergence.  Now,  an  argument  identi¬ 
cal  to  the  one  used  by  Broyden,  Dennis  and  More  [6]  can  be  used  to  establish 


lim  - -. — ; - =  0  . 

*  00  |«*  I 


(5.25) 


By  observing  that  Pk  v?*  =0  and  |P*  |=1  we  get 

I  Pk  ( Bk  -  A.  )sk  j  <  |P*  ( Bk  -  AS)sk  |  +  c  |v?»  Vff  ‘  -  V9k  | 


(5.26) 


Dividing  (5.26)  by  |«*  |,  taking  limits  and  using  (5.25)  we  get  (5.22). 

The  proof  for  the  BFGS  update  requires  one  to  work  with  the  inverse  update.  The 
proof  is  then  essentially  the  same  as  that  for  the  DFP  (see  Broyden,  Dennis  and  More' 
[6])-  □ 

It  is  of  interest  to  emphasize  that  the  DMM  using  the  projection  multiplier  update 
and  the  secant  updates  listed  above  satisfies  (5.25)  but  does  not  lead  to  superlinear  con¬ 
vergence.  This  fact  should  enhance  the  appreciation  for  Theorem  5.3. 
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